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CASCADED DOMINO FOUR-TO-TWO REDUCER 

CIRCUIT AND METHOD 

FIELD OF THE INVENTION 

Embodiments of the present invention relate to reducer circuits. In particular, 
embodiments of the present invention relate to the topology of cascaded domino four-to- 
two reducers. 

BACKGROUND 

Digital electronic devices such as microprocessors often contain numerous 
components that may perform sub-functions for the device. For example, the arithmetic 
logic unit (ALU) of a microprocessor typically contains one or more adders that receive a 
number of digital inputs and that output the sum of these inputs. As another example, an 
electronic device may contain multipliers that receive a number of digital inputs and output 
the result of a multiplication function performed on these inputs. Digital circuits such as 
adders and multipliers may themselves be made up of smaller digital circuits or logic gates 
such as, for example, a reducer. A reducer receives a number of input bits and provides sum 
and carry bits as outputs. For example, a three-to-two reducer may receive three input bits 
and provide a sum bit (i.e., the sum of the three input bits) and a carry bit (indicating if the 
addition of the three input bits generates a carry out) as outputs. A four-to-two reducer may 
receive four input bits and provide a sum bit and carry bit as outputs. As would be 
appreciated by a person of skill in the art, such a four-to-two reducer may also receive a carry 
in bit and provide an intermediate carry out bit (which may be absorbed by a neighboring 
four-to-two reducer), but such bits are not counted as part of the "four-to-two" because for 
counting purposes they cancel each other out. 
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The component circuits in digital devices often use domino logic. A domino circuit 
is atype of circuit that is arranged in stages (e.g., logic gates) with the outputs from one stage 
used as inputs into the next stage. The clock used with a domino circuit typically is delayed 
for each of the individual stages to provide a set-up time for the stages. The individual 
domino logic gates typically have one or more precharge blocks, which force the circuit to 
a known state during a precharge phase of a clock, and one or more evaluation blocks, which 
provide output values during an evaluation phase of the clock. Domino circuits generally 
have a static stage in between the domino stages. For example, the domino circuit may have 
an inverter between the domino stages or a static complimentary metal-oxide semiconductor 
(CMOS) gate between the domino stages. Another example is the zipper domino circuit, 
which has a P-channel metal-oxide semiconductor (PMOS) gate between the domino stages. 
In a cascaded domino circuit, the outputs from one N-channel metal-oxide semiconductor 
(NMOS) domino gate (i.e., agate with NMOS transistors in the evaluation block) are directly 
connected to the inputs of another NMOS domino gate. Thus, a cascaded domino circuit 
does not have any invertors, static stages, or PMOS gates in the critical path of the logic. 

Four-to-two reducers have not been constructed as cascaded domino circuits. 
Domino four-to-two reducers have been constructed by using three-to-two reducers, but such 
four-to-two reducers have used static CMOS stage(s) of logic between the three-to-two 
reducers. The static stages in these prior four-to-two reducers have an effect on the clocking 
of the circuit and, as is known in the art, a circuit may not operate correctly if it is not 
adequately sequenced. Thus, a topology for adequately sequencing a domino four-to-two 
reducer without static stages has not been known. 



DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a partial block diagram of a symmetric differential domino four-to-two 
reducer according to an embodiment of the present invention. 

FIG. 2 is a flow diagram of a method of providing a four-to-two reducer function 
according to an embodiment of the present invention. 

FIG. 3 is a partial block diagram of a symmetric carry generate gate according to 
another embodiment of the present invention. 

FIG. 4 is a partial block diagram of the set-reset latch shown in FIG. 1 according to 
an embodiment of the present invention. 

DETAILED DESCRIPTION 

Embodiments of the present invention provide topologies for cascaded domino four- 
to-two reducers. The present invention eliminates static CMOS stages in the four-to-two 
reducer by sequentially clocking the differential logic stages with a small delay between 
stages. In an embodiment, the delay between stages is approximately the delay of an inverter 
with a fanout of two, a delay which depends upon the process technology. A person of skill 
in the art would appreciate that an inverter has a fanout of 2 if the load on the output is two 
times the load on the input. 

According to one embodiment, an example of which is shown in FIG. 1, the four-to- 
two reducer is comprised of back-to-back three-to-two reducers. That is, outputs of the first 
three-to-two reducer may be directly connected to the second three-to-two reducer without 
passing through a static logic stage. As used herein, two components A and B are said to be 
"directly connected" if there is a logic path between the two components that does not have 
any other components (e.g., gates or transistors) between components A and B but may 
include lead lines or connector lines between the components. Two components that are 
directly connected may be said to be "back-to-back." 



In a further embodiment, each three-to-two reducers is comprised of (1) a differential 
domino exclusive-OR (XOR) gate to provide a sum bit output, and (2) a differential three- 
input carry generate gate to provide a carry bit output. A differential circuit is a circuit that 
has two complimentary sets of input and output terminals. In a differential logic gate, the 
first set of input and output terminals maybe referred to as the "true" inputs and outputs, and 
the second set may be referred to as the "compliment" inputs and outputs. For example, a 
differential three-to-two reducer may have three true data inputs and three compliment data 
inputs, a true sum output and a compliment sum output, and a true carry output and a 
compliment carry output. A true input and the corresponding compliment input may be 
referred to as a single "differential input." Similarly, a true output and the corresponding 
compliment output may be referred to as a single "differential output." 

The true and compliment outputs of the differential gates in a circuit should begin to 
switch with the same edge rate and should not be susceptible to pattern dependence. An 
embodiment of the present invention attains this criteria by using symmetric differential 
XOR gates and symmetric differential carry generate gates in the four-to-two reducer. A 
symmetric gate may be characterized by a having the load or capacitance for the true inputs 
to a symmetric gate being substantially the same as the load for the compliment inputs to the 
gate. In addition, a symmetric gate may be characterized by the pull down strength for the 
true outputs of the symmetric gate (i.e., the resistance of the transistors pulling down the 
outputs) being substantially the same as the pull down strength for the complement output(s), 
and the pull down strength for the true inputs being substantially the same as the pull down 
strength for the complement inputs. The Miller coupling may also be the same for the true 
inputs and complement inputs to the gate. In addition, the output drive strength may be the 
same for the true and compliment outputs of the symmetric gate. 

FIG. 1 is a partial block diagram of a symmetric differential domino four-to-two 
reducer 100 according to an embodiment of the present invention. Four-to-two reducer 100 



has four true data inputs labeled a in 111, b in 112, c in 113, and d in 117. Four-to-two 
reducer 100 also has four compliment data inputs labeled negative a 112, negative b 114, 
negative c 1 16, and negative d 118. In addition, four-to-two reducer 100 has a true carry 
input ( 1 5 1 ) and a compliment carry input (152). Four-to-two reducer 1 00 has two true carry 
outputs (121 and 191) and two compliment carry outputs (122 and 192). Outputs 121 and 
122 are intermediate carry outputs. Four-to-two reducer 1 00 also has a true sum output 1 68 
and a compliment sum output 1 67. The inputs and outputs from four-to-two reducer 1 00 are 
each a single bit value. As would be appreciated by a person of skill in the art, a bit value 
is a voltage range that represents a logical value. The compliment inputs receive a 
compliment value of the corresponding the true inputs, and the compliment outputs provide 
a compliment of the corresponding true outputs. For example, if a input 1 1 1 receives a value 
of logic 1, then negative a input 1 12 will receive the value of logic 0. Two stages of a 
multistage clock (first clock 1 0 1 and second clock 1 02) are input to four-to-two reducer 1 00 . 
In an embodiment, second clock 102 is a slightly delayed version of first clock 101. hi an 
embodiment, the delay is the delay on an inverter with a fanout of 2. hi an embodiment, the 
delay between clocks is 12 picoseconds (12 ps). 

In this embodiment, four-to-two reducer 1 00 is comprised of back-to-back three-to- 
two reducers. Outputs from first three-to-two reducer 120 are directly connected to second 
three-to-two reducer 150. First three-to-two reducer 1 20 is comprised of first XOR 1 30 and 
first carry generate gate 1 40. Similarly, second three-to-two reducer 1 50 is comprised of first 
XOR 1 60 and first carry generate gate 1 70. Second XOR gate 1 60 may provide the true and 
compliment sum output bits for four-to-two reducer 100, and second carry generate gate 170 
mayprovide carry generate outputs for four-to-two reducer 100. First carry generate gate 140 
may provide in addition carry output 148 and negative carry output 147 which are outputs 
of four-to-two reducer 1 00. In an embodiment, such as shown in FIG. 3, carry generate gates 
140 and 170 are symmetric. In a further embodiment, XOR gates 130 and 170 are also 



symmetric. In this further embodiment, XOR gates 1 30 and 1 70 may be any type of known 
or newly designed three-input differential XOR gates. 

The topology of four-to-two reducer 1 00 will now be described in more detail. Data 
inputs 1 1 1 to 1 16 are connected respectively to inputs a 131, negative a 132, b 133, negative 
b 134, c 135, and negative c 136 of first XOR gate 130. Data inputs 111 to 116 are also 
connected respectively to inputs a 141, negative a 142, b 143, negative b 144, c 145, and 
negative c 146 of first carry generate gate 140. First clock 101 is input to first XOR 130 and 
first carry generate 140. First XOR 130 outputs a sum 138 and a negative sum 137 as well 
as a carry 148 and negative carry 147. Sum 138 of first XOR 1 30 is connected to a data input 
of second XOR 160 (c 165) and a data input of second carry generate gate 170 (c 175), and 
negative sum 137 of first XOR 130 is connected to a compliment data input of second XOR 
160 (negative c 166) and a compliment data input of second carry generate 170 (negative c 
176). Carry output 148 from first carry generate 140 is connected to carry out 121 for four- 
to-two reducer 100, and negative carry output 147 from first carry generate 140 is connected 
to negative carry out 122 for four-to-two reducer 100. Thus, the carry outputs (true and 
compliment) of first carry generate 140 provide the intermediate carry output bits for four-to- 
two reducer 140. The true and compliment carry in bits for four-to-two reducer 100 (carry 
in 151 and negative carry in 152) are connected respectively to inputs of second XOR 160 
(b 163 and negative b 164) as well as to inputs of second carry generate gate 170 (b 173 and 
negative b 174). Finally, true and compliment inputs d 1 17 and negative d 1 18 of four-to- 
two reducer 100 are connected to respective inputs of second XOR 160 (a 161 and negative 
a 162) and second carry generate gate 170 (a 171 and negative a 172). 

Second XOR 160 provides a sum output 168 and a negative sum output 167. The 
sum outputs of second XOR 160 may be the sum outputs of four-to-two reducer 100. 
Similarly, second carry generate gate 170 provides a carry output 178 and a negative carry 
output 177. The carry outputs ofsecond carry generate gate 170 maybe the second carry bits 



that are output from four-to-two reducer 1 00. FIG. 1 shows set-reset (S/R) latch 1 80 and set- 
reset latch 190 coupled to four-to-two reducer 100. In particular, set-reset latch 180 has a 
sum input 181 coupled to sum output 168 of second XOR 160, a negative sum input 182 
coupled to negative sum input 167 of second XOR 160, and a sum output 184. Similarly, 
set-reset latch 1 90 has a carry input 1 9 1 coupled to carry output 1 78 of second XOR 1 70, a 
negative carry input 1 92 coupled to negative carry input 1 77 of second XOR 1 70, and a carry 
output 194. In an embodiment, the set-reset latch 180 and set-reset latch 190 act as a dual 
rail domino to static converter and maybe used to convert the outputs of four-to-two reducer 
1 00 from domino logic to static logic. In another embodiment, a set dominant latch or other 
circuit may be used to perform this conversion function, the choice of which depends upon 

the desired use of the output. 

The operation of four-to-two reducer 100 may be described with reference the 
following truth tables. These truth tables show the results that may be output from the 
embodiment shown in FIG. 1 based on various inputs. For the sake of simplicity, these truth 
tables only show the results for the various possible true input states. A person of skill in the 
art may easily derive the compliment input and outputs based on the truth tables below. Also 
for simplicity, the first table shows the states where carry in 1 5 1 is low (i.e., logic 0), and the 
second table shows the stages where carry in 151 is high (i.e., logic 1). 



Truth Table for Four-to-two Reducer 100 
(Part 1, carry in = 0) 
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b 113 
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c 115 
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d 117 


0 


1 


0 


1 


0 


1 


0 
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carry in 
151 
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Truth Table for Four-to-two Reducer 100 
(Part 2, carry in = 1) 
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car 177 
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1 
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As can be seen from the truth tables above, the value output at sum 138 is the XOR 
of bits a 1 1 1 , b 1 13, and c 1 1 5. The value output at carry 148 will be a 1 if and only if any 
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two or more of bits a 111, b 113, and c 115 input a 1. The value output at sum 168 is the 
XOR of bits d 1 17, sum 138, and carry in 151. The value output at carry 178 will be a 1 if 
and only if any two or more of bits d 1 1 7, sum 1 3 8, and carry in 1 5 1 input a 1 . 

FIG. 2 is a flow diagram of a method of providing a four-to-two reducer function 
according to an embodiment of the present invention. This method will be discussed with 
reference to the embodiment shown in FIG. 1, but of course this method could also be 
performed with other embodiments. In this embodiment, three pair of true and compliment 
data bits (e.g., bits 1 1 1 to 1 12 of FIG. 1) are received at a first differential domino three-to- 
two reducer (201 of FIG. 2) such as first three-to-two reducer 120. In an embodiment, the 
data bits may be received at an XOR and a carry generate gate in the first three-to-two 
reducer (e.g., first XOR 130 and first carry generate 140). In an embodiment, the carry 
generate gate in the first three-to-two reducer is a symmetric carry generate gate. 

After a first clock (e.g., first clock 101) cycles from an evaluation phase to a 
precharge phase (202 of FIG. 2), a first pair of true and compliment sum bits are output from 
the first three-to-two reducer to a second differential domino three-to-two reducer (203) such 
as second three-to-two reducer 1 50. In an embodiment, the first pair of true and compliment 
sum bits are outputted directly to the second three-to-two reducer from the first three-to-two 
reducer. A pair of true and compliment fourth data bits and a pair of true and compliment 
carry in bits are also received at the second three-to-two reducer (204). The fourth data bits 
and a pair of true and compliment carry in bits may be received at the second three-to-two 
reducer before, after, or at the same time as, the three pair of data bits are received at the first 
three-to-two reducer. In an embodiment, the sum bits, carry in bits, and fourth data bits may 
be received at an XOR and a carry generate gate in the second three-to-two reducer (e.g., 
second XOR 1 60 and second carry generate 1 70) . hi an embodiment, the carry generate gate 
in the second three-to-two reducer is a symmetric carry generate gate. After a second clock 
(e.g., second clock 102) cycles from an evaluation phase to a precharge phase (205 of FIG. 



2), a second pair of true and compliment sum bits and a pair of carry output bits are output 
from the second three-to-two reducer (206). The second clock may be a delayed version of 
the first clock. For example, the second clock may be delayed from the first clock by 
approximately the delay of an inverter. 

The second pair of true and compliment sum bits and the pair of carry output bits may 
be the output from the four-to-two reducer. In an embodiment, the second pair of true and 
compliment sum bits and the pair of carry output bits are converted from domino logic to 
static logic by a dual domino to static converter or other device. An example of a set-reset 
latch that may be used is shown in FIG. 4. 

FIG. 3 is a partial block diagram of symmetric carry generate gate 140 according to 
an embodiment of the present invention. Symmetric carry generate gate 170 of FIG. 1 may 
have the same topology as carry generate gate 140. As discussed above, carry generate gate 
140 has three true inputs (a 141, b 143,and c 145) and three compliment inputs (negative a 
142, negative b 144, and negative c 146). Carry generate gate 140 receives first clock 101 . 
Carry generate has a carry output 148 and a negative carry output 147. 

As shown in FIG. 3, carry generate 140 consists of aplurality of connected transistors 
that may be logically divided into a precharge block 330, a keeper 340, a first evaluation 
block 350, a second evaluation block 360, and a clock or footer transistor 371. In this 
embodiment, precharge block 330 comprises two PMOS transistors 33 1 and 332, and keeper 
340 comprises two PMOS transistors 343 and 344. The source terminals ("sources") of 
transistors 331, 332, 343, and 344 are connected to Vcc. The gates of the transistors in 
precharge block 330 (33 1 and 332) are connected to clock 1 01 . The drain terminal ("drain") 
of transistor 33 1 is connected to carry output 1 48, and the drain of transistor 332 is connected 
to negative carry output 147. The transistors in keeper 340 (transistors 343 and 344) are 
cross coupled. That is, the drain of transistor 343 is connected to the gate of transistor 344, 
and the drain of transistor 344 is connected to the gate of transistor 343. In addition, the 
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drain of transistor 343 is connected to carry output 148 and the drain of transistor 344 is 
connected to negative carry output 147. 

In this embodiment, footer transistor 37 1 and the transistors in first evaluation block 
350 and second evaluation block 360 are NMOS transistors. Footer transistor 371 has its 
gate connected to clock 101, its drain connected to ground, and its source connected to the 
drains of three transistors in first evaluation block 350 (352, 353, and 354) and to the drains 
of three transistors in second evaluation block 360 (363, 364, and 365). The transistors in 
the evaluation blocks form a number of stacks from footer transistor 371 to either carry 
output 148 or negative carry output 147. Thus, footer transistor 137 provides a path to 
ground from the evaluation stacks. Transistors 355 and 352 are one example of such a stack. 
Transistor 3 55 has its drain connected to the source of transistor 352 and its source connected 
to carry output 148. Similarly, transistor 361 has its drain connected to the source of 
transistor 363 and its source connected to negative carry output 147. In addition, transistor 
356 has its drain connected to the sources of transistors 353 and 354 and has its source 
connected to carry output 148. Finally, transistor 362 has its drain connected to the sources 
of transistors 364 and 365 and has its source connected to negative carry output 147. In this 
embodiment, the number of transistors in each of the stacks connecting footer transistor 317 
to one of the outputs (147 and 148) is the same (i.e., 2 transistors). 

The gates of the transistors in the evaluation blocks are connected to the data inputs 
to effectuate the desired carry generate function. Input a 141 is connected to the gates of 
transistors 352 and 353, and input negative a 142 is connected to the gates of transistors 363 
and 364. Input b 143 is connected to the gates of transistors 355 and 354, and input negative 
b 144 is connected to the gates of transistors 361 and 365. Input c 145 is connected to the 
gates of transistor 356, and input negative c 146 is connected to the gates of transistor 362. 

Carry generate gate 140 of FIG. 3 is symmetric. The first evaluation block 350 and 
second evaluation block 360 each have the same number of transistors (i.e., four transistors). 
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In cany generate gate 140, the gate of each of the transistors in both the first evaluation block 
350 and second evaluation block 360 is connected to one of the six data inputs (141 to 146) 
to carry generate gate 140. Another characteristic of the topology of carry generate gate 140 
is that second evaluation block 3 60 has the same number of transistors in parallel relationship 
as first evaluation block 350 and the same number of transistors in serial relationship as the 
first evaluation block 350. In an embodiment, all of the transistors in carry generate gate 140 
are 1 micron in size. Of course, in other embodiments the transistors may have other sizes 
and some or all of the transistors may be different sizes than other transistors, hi an 
embodiment, the corresponding transistors in each evaluation block have the same size. 

A person of skill in the art would appreciate that the truth tables above describe the 
operation of carry generate gate 140. Of course, the present invention is not limited to a 
carry generate such as shown in FIG. 3. 

FIG. 4 is a partial block diagram of the set-reset latch 1 90 shown in FIG. 1 according 
to an embodiment of the present invention. Set dominant latch 180 of FIG. 1 may have the 
same topology as set-reset latch 190. Set dominant latch 190 has two invertors (401 and 402) 
and an inverting half tri-state gate 403. In addition, set-reset latch 190 has an NMOS 
transistor 411 and a PMOS transistor 412. Carry input 191 is connected to the gate of 
transistor 412 as well as to an input of half tri-state gate 403 so that half tri-state gate 403 
may be pulled down by the carry 1 9 1 data path. Negative carry 1 92 is connected to an input 
to invertor 401. The output of inverter 401 is connected to the gate of transistor 411. The 
drain of transistor 41 1 is connected to ground, and the source of transistor 412 is connected 
to Vcc. The output of half tri-state gate 403 is connected to carry output 193 and to the input 
of inverter 402. The output of inverter 402 is connected to an input of half tri-state gate 403 . 
Finally, the drain of transistor 412 and the source of transistor 411 are connected to carry 
output 193. 
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A person of skill in the art would appreciate that set-resent latch 190 is a static latch 
that performs as a dual rail to static converter. A differential input (carry 191 and negative 
carry 192) is converted to a single carry output 193. When carry 191 is high (and thus 
negative carry 1 92 is low), then carry output 1 93 is high. Conversely, when carry 1 9 1 is low 
(and thus negative carry 192 is high), then carry output 193 is low. 

The present invention provides a cascaded differential domino four-to-two reducer. 
The four-to-two reducer of the present invention is constructed of back to back differential 
domino three-to-two reducers. The cascaded differential domino four-to-two reducer of the 
present invention is relatively faster than prior four-to-two reducers because the circuit 
disclosed does not need to wait for the input to reach Vcc/2 (the gate threshold) before 
beginning to switch. The differential logic may act as a sense amp and allow for the clock 
to drive the transition with small differentials on the inputs. In addition, embodiments of the 
four-to-two reducer circuit of the present invention do not have any stacked PMOS devices. 

Four-to-two reducers designed according to embodiments of the present invention 
may be used as a building block to create a variety of more complex circuits such as 
multipliers and redundant form adders. By eliminating the static stages in the reducer, use 
of the present invention may decreases the delay of the more complex circuit. In addition, 
leakage current and size can be reduced by the present invention because the threshold 
voltage (Vt) of transistors in the critical path may be lowered. 

Several embodiments of the present invention are specifically illustrated and/or 
described herein. However, it will be appreciated that modifications and variations of the 
present invention are covered by the above teachings and within the purview of the appended 
claims without departing from the spirit and intended scope of the invention. 
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